Unsupervised Pattern Discovery in Biosequences Using Aligned Pattern Clustering

نویسندگان

  • En-Shiun Annie Lee
  • Antonio Sze-To
  • Andrew KC Wong
  • Daniel Stashuk
چکیده

Protein, RNA and DNA are made up of sequences of amino acids/nucleotides, which interact among themselves via binding. For example, (1) protein-DNA binding regulates gene transcription [1]; and (2) Protein-protein binding plays important roles in cell cycle control and signal transduction [2].The binding is maintained by either the direct participation or assistance of conserved short segments of biosequences called functional elements. Because of their importance in preserving function, they are well conserved throughout evolution. Their recognition is therefore essential for an in-depth understanding of the biological mechanisms [3] such as inhibitor design [4]. Although these functional elements could be discovered from the three-dimensional structural forms of the biosequences, the applicability is limited due to the high experimental cost. With the advent of new sequencing technologies [5], it is preferable to discover, directly from the abundant biosequence data, functional elements where many of them are short with variable length, like Short Linear Motifs (SLiMs [6] ) which play important roles in protein-protein interaction but are only 3 to 15 amino acids in length. Such short elements could not be captured well by the popular position weight matrices [7]. In this paper, we aim to briefly review an unsupervised pattern discovery tool known as Aligned Pattern Clustering (or its software WeMineTM) [8-11] which is developed to facilitate the discovery and analysis of patterns in biosequences. Its applications include 1) identifying functional elements in protein sequences [8-11,2] revealing functioning subgroup characteristics of functional elements [12-14,3] identifying co-occurring intra-protein [15,16], inter-protein [17] and proteinDNA functional elements [18,19].

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Comparison Between Unsupervised and Supervise Fuzzy Clustering Method in Interactive Mode to Obtain the Best Result for Extract Subtle Patterns from Seismic Facies Maps

Pattern recognition on seismic data is a useful technique for generating seismic facies maps that capture changes in the geological depositional setting. Seismic facies analysis can be performed using the supervised and unsupervised pattern recognition methods. Each of these methods has its own advantages and disadvantages. In this paper, we compared and evaluated the capability of two unsuperv...

متن کامل

Knowledge Discovery in Biosequences Using Sort Regular Patterns

This paper considers knowledge discovery by sort regular patterns, which are strings over sort letters representing nite sets of basic letters. We devise a learning algorithm for the class based on the minimal multiple generalization technique, and evaluate the method by experiments on biosequences from GenBank database. The experiments show that relatively a simple sort pattern can represent a...

متن کامل

Steel Consumption Forecasting Using Nonlinear Pattern Recognition Model Based on Self-Organizing Maps

Steel consumption is a critical factor affecting pricing decisions and a key element to achieve sustainable industrial development. Forecasting future trends of steel consumption based on analysis of nonlinear patterns using artificial intelligence (AI) techniques is the main purpose of this paper. Because there are several features affecting target variable which make the analysis of relations...

متن کامل

Reports in Informatics Approaches to the Automatic Discovery of Patterns in Biosequences

Approaches to the automatic discovery of patterns in biosequences. Abstract This paper is a survey of approaches and algorithms used for the automatic discovery of patterns in biosequences. Patterns with the expressive power in the class of regular languages are considered, and a classiication of pattern languages in this class is developed, covering those patterns which are the most frequently...

متن کامل

Reports in Informatics Relation Patterns and Their Automatic Discovery in Biosequences Relation Patterns and Their Automatic Discovery in Biosequences

We have extended the pattern language used in PROSITE to enable it to describe dependencies between amino acid residues. We have developed a minimum description length principle based tness measure evaluating the signiicance of such patterns in relation to a set of sequences, and an algorithm automatically nding signiicant patterns in unaligned sequences. Computing experiments are reported show...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016